Identification of Student- and Teacher-Level Variables 
in Modeling Variation of Mathematics Achievement Data 



James E. Tarr, University of Missouri 
Daniel J. Ross, University of Missouri 
Melissa D. McNaught, University of Iowa 
Oscar Chavez, University of Missouri 
Douglas A. Grouws, University of Missouri 
Robert E. Reys, University of Missouri 
Ruthmae Sears, University of Missouri 
R. Didem Taylan, University of Missouri 



Please address all correspondence to: 

James E. Tarr 
University of Missouri 

Department of Learning, Teaching & Curriculum 

121E Townsend Hall 

Columbia, MO 65211-2400 

578-882-4034 

tarrj@missouri.edu 



Paper presented at the 

Annual Meeting of the American Educational Research Association 



Denver, April 30-May 4, 2010 




Identification of Student- and Teacher-Level Variables 
in Modeling Variation of Mathematics Achievement Data 



James E. Tarr 
Daniel J. Ross 

University of Missouri 

Melissa D. McNaught 

University of Iowa 

Oscar Chavez 
Douglas A. Grouws 
Robert E. Reys 
Ruthmae Sears 
R. Didem Taylan 
University of Missouri 



BACKGROUND 

Perspective 

Based on a long series of international comparisons of student mathematics 
achievement (e.g., TIMSS, PISA) it is clear that US students are not achieving to their 
potential. The reasons for this are obviously complex, but "the TIMSS curricular reports 
suggested that at least part of the problem resided in American curricula, which were seen 
as more skills oriented, more repetitive, and less conceptually deep than those of nations 
that scored better on TIMSS (Schmidt, McKnight, & Raizen, 1997)" (Schoenfeld, 2006, p. 
14). 

Disappointing performance of US students in international studies spurred the 
National Science Foundation to invest in the development of "reform" curricula that 
embodied tenets of the National Council of Teachers of Mathematics’ Curriculum and 
Evaluation Standards for School Mathematics (1989). These NSF-funded curricular 
materials differed from "traditional" mathematics textbooks by integrating several 
branches of mathematics, focusing on the development of mathematical thinking and 
problem solving, and deemphasizing skills and symbol manipulation (see Nathan, Long, & 
Alibali, 2002). Because these new approaches to curriculum organization have only 
recently entered the mainstream, "relatively small numbers of students have worked their 
way through a full reform curriculum... (and) there are scant data regarding the 
effectiveness of these curricula — either on their own merits or in comparison with 
traditional curricula" (Schoenfeld, 2006, p. 15). 



Paper presented at the Annual Meeting of the American Education Research Association, 
Denver, May 2010. The authors wish to thank Michael Harwell for his methodological 
expertise. This paper is based on research conducted as part of the Comparing Options in 
Secondary Mathematics: Investigating Curriculum (COSMIC) project, a research study 
supported by the National Science Foundation under grant number REC-0532214. Any 
opinions, findings, and conclusions or recommendations expressed in this paper are those 
of the authors and do not necessarily reflect the views of the National Science Foundation. 





MODELING VARIATION OF STUDENT ACHIEVEMENT 



2 



The NSF-funded curricula are indeed controversial, having both outspoken 
advocates and detractors. For more than a decade, mathematics education has endured 
"math wars" in which traditionalists have argued that "standards-based" curricula are 
"superficial and undermine classical mathematical values; reformers claim that such 
curricula reflect a deeper, richer view of mathematics than the traditional curriculum" 
(Schoenfeld, 2006, p. 15) (for a more detailed history of the "math wars," see Schoenfeld, 
2004). To placate teachers, administrators, students, and parents, some school districts 
recently began to offer parallel curriculum paths in which students are presumably "free" 
to study mathematics using one of two organizational schemes, an integrated approach or a 
(traditional) subject-specific approach. It is in the special context of parallel curricular paths 
that we examine curricular effectiveness. 

The COSMIC Project 

Funded by the National Science Foundation, Comparing Options in Secondary 
Mathematics: Investigating Curriculum (COSMIC) is a research project that involves a three- 
year longitudinal comparative study of integrated mathematics curricula and subject- 
specific mathematics curricula on mathematical learning in schools that offer parallel 
curricular paths. The primary goal of the COSMIC Project is to evaluate secondary school 
students’ mathematics learning using multiple measures of student achievement while 
carefully attending to curriculum implementation via classroom observations, opportunity- 
to-learn (OTL) data, teacher surveys and interviews. Preliminary work for the COSMIC 
project began in 2005 with data collection starting in the Fall of 2006 and continuing 
through the 2008-2009 school year. 

RESEARCH QUESTIONS 

Given the large federal investment in NSF-funded curricular materials, their infusion 
into US mathematics classrooms, and the corresponding response by teachers, 
administrators, students and parents, the COSMIC Project sought to answer the following 
research questions: 

1. Are there differential effects on high school students’ mathematics learning when 
they study from an integrated approach textbook and when students study from a 
subject-specific textbook? In particular, are there differential curricular effects with 
respect to student performance on assessments of: 

o Common objectives; 

o Mathematical reasoning; 

o Mathematics concepts and problem solving. 

2. What are the relationships among curriculum type, fidelity of implementation, and 
student learning? In particular, 

o What curriculum implementation factors are associated with high school 
students’ mathematics learning? 

o What teacher characteristics are associated with high school students’ 
mathematics learning? 

OBJECTIVES OF THIS MANUSCRIPT 

In this paper we address key issues in the design of longitudinal studies of curricular 
effectiveness with particular emphasis on data collection, reduction, and coherence in 



MODELING VARIATION OF STUDENT ACHIEVEMENT 



3 



modeling student achievement in year 1 of the COSMIC Project. Reduction and coherence 
are important in the context of large data sets such as ours that include extensive 
information about teachers as well as detailed information about their teaching practice. In 
particular, there is a need to strike a balance between the collection of massive amounts of 
student and teacher data to help explain and understand the student learning that takes 
place and moving forward with developing parsimonious models of the important 
variables that relate to student learning. In this paper, we describe our approach to 
balancing these competing demands and share insights related to year 1 of the COSMIC 
Project. 



SIGNIFICANCE OF THE STUDY 

Developing a clear understanding of the dynamics of parallel curriculum use and a 
comprehension of the factors associated with how and what students learn under different 
curricular approaches is imperative for many reasons. For example, understanding a 
parallel-use context is essential for future curriculum change because parallel programs 
can likely be an intermediate step in curricular change in many schools. Furthermore, if 
and when, a scaling up of an integrated content approach occurs in US schools then the 
findings from this study will provide valuable information for the decision making that will 
need to take place as part of such a movement. The improved understanding that this study 
provides (albeit in a special context) concerning the relationships among curricular 
organization, curriculum implementation factors, and gains in student learning will be 
useful to the field in theory building, curriculum writing, professional development, and 
decision making by school administrators. 

THEORETICAL PERSPECTIVES: 

On Evaluating Curricular Effectiveness 

In designing the COSMIC project research we took account of the comprehensive 
framework for evaluating curriculum effectiveness developed by the National Research 
Council (2004) (see Figure 1). As the figure shows, the first step in developing a research 
design is to attend to Program Theory, which essentially means determining program 
components, identifying implementation strategies including processes and contextual 
influences, and deciding on student outcomes to be taken into account. In the COSMIC 
project, the program components of mathematical content and curriculum design elements 
were characterized using a comprehensive content analysis of each of the two curriculum 
types studied. Implementation components, in particular implementation resources and 
processes, were ascertained by curriculum type in two ways: examination of teacher’s 
editions of textbooks and textbook author interviews. Our careful attention to teachers’ 
implementation of curricular materials was necessary in order to draw causal inferences 
between curriculum and student learning; the National Research Council advocates that 
studies of curricular effectiveness account for treatment integrity, or what we refer to as 
fidelity of implementation. Student outcomes were carefully considered for inclusion in this 
study, including multiple assessments, enrollment patterns, attendance, and attrition. 
Because there are multiple important student outcomes but limits on how many 
assessments can be administered in one study, we decided to focus on the most important 
outcome in our view, namely student learning. We annually measured student learning in 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



4 



three distinct ways in the study by using what we call a fair test, a mathematical reasoning 
test, and a standardized achievement test. 




Figure 1. Framework for evaluating curricular effectiveness (National Research Council, 
2004 ). 



MODELING VARIATION OF STUDENT ACHIEVEMENT 



5 



SAMPLE 

Curriculum Types 

The COSMIC project studied two curriculum types where the organization of the 
mathematics content differed, namely subject-specific organization and integrated 
organization. Commercially-developed, traditional mathematics textbooks exemplify 
subject-specific curricula and these widely-used textbooks focus on a particular strand of 
mathematical content each year, such as algebra or geometry. Textbook series of Holt, 
Prentice Hall, Glencoe, McDougal Littell, and HRW constitute the sample of subject-specific 
curricula; subject-specific courses include common titles such as Algebra 1, Geometry, 
Algebra 2, and Precalculus. By way of contrast, in the integrated curriculum organization 
multiple strands of mathematical content (geometry, algebra, discrete mathematics and 
statistics) are coalesced. During the 1990s, the National Science Foundation invested 
heavily in the integrated approach to mathematics curricula. Among the curriculum 
development projects at the high school level, Core-Plus emerged as the most popular and 
maintains the greatest market share. In this study, the Core-Plus textbook series was 
selected as the representative for the integrated curricula; students studying from the 
integrated curriculum took Course 1, Course 2, and Course 3. 

A primary difference between the subject-specific curricula and the integrated 
curricula is how lessons are structured. Each subject-specific lesson usually has a Lesson 
Preview, Teach (containing numerous worked examples), Practice and Apply, and closure 
component. Teachers enacting a subject-specific curriculum generally facilitate student 
learning using teacher-led, whole-class discussions. The integrated curriculum is 
structured such that, following a relatively brief Launch, students work in small-group 
setting to Explore mathematical ideas while the teacher serves as facilitator; subsequently 
students participate in a Share and Summarize component in which they share their 
thinking in a whole-class discussion, and discuss the important mathematical ideas of the 
lesson. In the integrated curricula, a lesson Launch occurs at the beginning of a unit, so 
there may be multiple days in which students engage in Explore and Share and Summarize 
without a Launch component of the lesson. Notwithstanding the preceding, for the 
integrated curricula, the textbook publisher recommends that closure be included in all 
lessons. 

Schools 

Selection method. To identify an appropriate sample the COSMIC project did an 
extensive search for high schools throughout the US that offered parallel curriculum paths 
to their secondary students, and students were free to choose between either path. In 
particular, we searched for schools that offered both integrated mathematics and a subject- 
specific (Algebra 1, Geometry, Algebra 2, Pre-calculus) curriculum organization. This 
requirement narrowed the field of possible high schools significantly, but it was a crucial 
requirement for the design of this research study. Satisfying this requirement helped 
ensure that there would be a balance between curriculum types with regard to the number 
of days of instruction, and controlled for many other contextual factors such as homework 
and technology policies, organization and length of class periods, professional development 
provided during the study, SES make-up of the student body, and so forth. Moreover, we 
stipulated that schools were eligible for participation only if students were not tracked ; 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



6 



that is, schools were ineligible if policies channeled high-performing students into one 
curriculum type while directing lower-performing students into another curriculum. 

We also selected schools that would provide diversity in our sample with regard to 
geographic region, race/ethnicity, and social economic levels. Furthermore, we selected 
only schools that were not using either of their mathematics textbook series for the first 
time. This requirement ensured that most teachers were familiar and had experience using 
the textbooks in our sample. 

Once schools that met our criteria were identified, we visited the schools to talk 
with school representatives. This always included mathematics teachers, mathematics 
chair/coordinator (where they existed) and the school principal. In the majority of cases it 
also required meeting with the district superintendent, and in all of these meetings the 
researchers described the nature of the research and what commitments were required by 
the district (e.g., providing prior achievement test data, allowing researchers to observe 
classes, and committing three days for assessments). We also discussed the benefits the 
district would receive from this research effort, including results from the additional 
assessments, modest honoraria for teachers, and the findings from the research. The latter 
point was particularly persuasive, as all of the schools were interested in research data 
regarding the impact of these two curricular paths on the performance of students. The 
process described above resulted in choosing 11 schools in six school districts that were 
located in five geographically dispersed US states. 

Demographic data. Consistent with the selection criteria, there was diversity in the 
student sample for year 1. As depicted in Table 1, data were collected from 2,621 students, 
with slightly more females comprising the sample than males. While the majority of 
students were White (77.56%), the sample represented a relatively diverse ethnic 
population with the proportion of White students ranging substantially from 50.45% in 
District R to 94.02% in District C. A larger percentage of Black students were reported in 
District W (20.44%) than in other districts; while District R reported nearly 40% of its 
student population as Hispanic. Other races — including Asian/Pacific Islander, Native 
American/Alaskan Native, Mixed Race, and Unclassified — comprised 4.36% of the sample 
but accounted for nearly 7% in District R. 

With regard to characteristics that qualify students for school services and/or 
resources, there was similar diversity across districts. For example, the portion of students 
with Individual Educational Plans (IEP) ranged from 1.66% in District W to 10.12% in 
District B. The percentage of students classified as Limited English Proficiency (LEP) was as 
high as 4.89 in District C while District I reported none. The use of Free/Reduced Lunch 
(FRL) is commonly used in educational research despite its limitations (Harwell & LeBeau, 
2010) as a measure of SES. In this study, there was a wide range in the percent of students 
qualifying for FRL, from 19.09% in District R to 53.27% of District I. 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



7 



Table 1 

Demographic Data for Each School District in the COSMIC Project, Year 1 







Gender 


Race/Ethnicity 


Qualifications | 


District 


Students 


Male 


Female 


Black 


Hispanic 


White 


Other 1 


IEP 


LEP 


FRL 


B 


257 


46.70 


53.30 


1.56 


6.23 


90.27 


1.94 


10.12 


3.11 


28.79 


C 


184 


50.00 


50.00 


0.00 


4.89 


94.02 


1.09 


9.78 


4.89 


46.20 


I 


336 


47.62 


52.38 


10.12 


8.04 


77.98 


3.86 


7.74 


0.00 


53.27 


R 


440 


47.05 


52.73 


2.95 


39.77 


50.45 


6.83 


4.32 


3.86 


19.09 


T 


802 


50.00 


50.00 


0.62 


2.99 


92.77 


3.62 


8.60 


1.12 


25.31 


W 


602 


46.35 


53.65 


20.44 


7.31 


66.44 


5.81 


1.66 


1.66 


25.75 


Totals 


2,621 


48.04 


51.93 


6.83 


11.25 


77.56 


4.36 


6.41 


2.02 


29.76 



As depicted in Table 2, the 2,621 students comprising the year 1 sample were 
largely evenly distributed across the two curriculum types, with 48% enrolled in integrated 
and 52% enrolled in subject-specific curriculum. Although the number of students enrolled 
in each curriculum type was similar overall, far more students enrolled in the integrated 
path in District I while the opposite was true for District R. Correspondingly, there were 
slightly more teachers of the subject-specific curriculum than taught the integrated 
curriculum. Of the 43 teachers who participated in the COSMIC project, 20 taught the 
integrated curriculum while the remaining 23 taught the subject-specific curriculum; this 
includes a few teachers who taught both curriculum types. 

Table 2 

Number of Teacher and Student Participants in Year 1, by School District and Curriculum 
Type 





Teachers 


Students | 


District 


Integrated 


Subject-Specific 


Integrated 


Subject-Specific 


B 


3 


2 


127 


130 


C 


1 


2 


97 


87 


I 


4 


2 


286 


50 


R 


1 


5 


47 


393 


T 


6 


4 


462 


340 


W 


5 


8 


237 


365 


Total 


20 


23 


1,256 


1,365 



Student participation necessitated the writing of three end-of-year exams, 
administered during the final six weeks of year 1. Although discussion of the outcome 
measures is offered subsequently, it is worth noting that 2,615 of 2,621 students took at 
least one test in year 1 of the COSMIC Project, an astonishing participation rate of 99.77%. 

LITERATURE REVIEW 

The notion that NSF-funded curricula are more effective than traditional curricula in 
yielding student mathematical learning is highly controversial (Senk & Thompson, 2003). 



1 Includes Asian/Pacific Islander, American Indian/Alaskan Native, Mixed Race, and Unavailable. 































































































MODELING VARIATION OF STUDENT ACHIEVEMENT 



8 



In a comprehensive review of research on curricular effectiveness, the National Research 
Council (2004) identified numerous methodological limitations of studies on the impact of 
mathematics curriculum on student learning. Among these, the NRC noted that few studies 
utilized an experimental design, included multiple measures of student learning outcomes, 
or were sensitive to treatment integrity (or fidelity of implementation). Moreover, they 
acknowledge a dearth of longitudinal studies of curricular effectiveness. Although the NRC 
advocates additional studies, Cai and Moyer (2006) argue that selecting the optimal way to 
conduct research on the effects of different curricula on student learning is as equally 
controversial. 

Most studies measuring the effectiveness of NSF-funded curricula have been 
conducted in the context of field-tests (Senk & Thompson, 2003). Considering the potential 
bias that field tests conducted by the curriculum developers might carry into the studies, 
Harwell et al. (2007) and Post et al. (2008) based their curriculum research on district- 
wide curricula adoptions, where teachers were required to teach the adopted curriculum, 
as opposed to the case of field-test versions of the curricular material. Harwell et al. (2007) 
examined mathematics achievement of secondary students while comparing different 
types curricula; Post et al. (2008) studied achievement models of middle school students 
who were enrolled in a Standards- based curriculum. Both studies utilized hierarchical 
linear modeling (HLM) to differentiate the effects of student- and classroom-level variables 
that offer predictive power in modeling student achievement. Descriptive data suggest that 
low socioeconomic status (SES), African American, nonnative English speakers and special 
education students were consistently outperformed by their peers. 

In both Harwell et al. (2007) and Post et al. (2008), student-level variables included 
prior mathematics achievement, SES (i.e., students qualifying for free or reduced-price 
school lunch [FRL]), gender, and attendance. Classroom-level variables included class SES 
level, percent ethnic minority (Black, Asian, and Hispanic), English Language Learners, 
special education students, and female students as well as attendance and school district 
affiliation. Harwell etal. (2007) added curriculum type as a classroom-level predictor. HLM 
analyses results revealed that SES level and prior mathematics achievement consistently 
and strongly predicted mathematics performance at both student- and classroom-levels 
whereas gender and attendance were not found to explain significant variability in 
students’ mathematics performance. Post et al. (2008) found that suburban classrooms 
outscored urban classrooms in mathematics achievement, indicating how school location 
may impact student performance. A key finding by Harwell et al. (2007) was that when all 
variables were taken into account, there was no statistically significant difference among 
different types of curricula. In both of these studies, the teacher-level variable "professional 
development hours” was not a significant predictor of student achievement. However, 
neither of the previous studies was able to carefully assess the extent of curriculum 
implementation in the classroom. 

Schoen et al. (2003) and McCaffrey et al. (2001) investigated effects of teacher 
variables on student achievement. The former was a field test study that examined 
teachers’ preparation, practices and concerns related to students’ mathematics 
achievement in the implementation of the Core-Plus curriculum. This study measured the 
teacher achievement index, which was defined as the mean of each teacher’s students’ 
adjusted mean posttest score (posttest scores after removing the variance due to the 
pretest). Although pretest results revealed that the percentage of free or reduced-price 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



9 



lunch (FRL) and the sum of percentages of African American, Native American and Hispanic 
students were strongly and negatively correlated with the pretest achievement, the 
adjusted mean posttest scores were not statistically significantly correlated with any of 
these variables. Using regression analysis, the authors proposed a model of student 
achievement. In contrast to the findings of Harwell et al. (2007), Schoen et al. determined 
that teachers’ completion of a developer-sponsored summer workshop was the most 
significant variable predicting their students’ achievement. Moreover, they identified 
several other variables that were positively and significantly associated with adjusted 
student achievement, including (a) cooperation with other teachers and having confidence 
in teaching, (b) using group work instead of teacher presentation and whole group 
discussion, (c) spending less time on non-academic matters, (d) using a variety of 
assessment methods, (e) not replacing the curriculum materials with the ones that are less 
open-ended and more skill-oriented, (f) high expectations on homework and grading, and 
(g) high observer rating based on the criteria for effective reform teaching. 

Investigating the relationship between teachers’ use of reform-based instructional 
practices and student achievement, McCaffrey et al. (2001) based their study on self- 
reported data collected from NSF-funded and traditional curriculum teachers while also 
taking into account of different student variables. Results revealed that teachers’ reported 
use of reform teaching practices were positively correlated to the achievement of the 
students in the integrated mathematics classes while no significant correlation was 
observed for the achievement of students in the traditional mathematics classes. In general, 
students whose teachers had a graduate degree in mathematics or mathematics education 
tended to score higher in achievement tests. Nevertheless, teacher’s level of training was 
excluded from the model due to its possible interaction with teaching practices in 
classroom. Teacher background variables defined as their degree, certification status, 
coursework in mathematics, gender, ethnicity and years of teaching experience were not 
found to have significant predictive power on reported reform-practices. 

DESIGN AND DATA SOURCES 
Hierarchical Linear Modeling 

Because students experience the school mathematics curriculum in groups, not as 
individuals, it is not appropriate to use student as the unit of analysis in curriculum 
evaluation studies (National Research Council, 2004; Osborne, 2000). The recognition of 
this fact warrants the use of group means (e.g., class averages, scores aggregated by 
teacher) or multi-level modeling, in which students are nested in hierarchical structures. 
For example, students experience mathematics as a class ; several sections of the same class 
are taught by the same teacher, several teachers are nested within the same school ; and 
(public) schools are held accountable to the same state curriculum framework. In principle, 
one could argue that students represent the first of many levels in a nested, hierarchical 
educational system. However, modeling student achievement across many levels is 
extraordinarily complex, necessitates a sufficient number of cases in each level, and 
interpretation of results is particularly challenging. 

Although students experience curriculum as a class, we argue that several classes 
taught by the same teacher are not independent because it is likely many aspects of 
instruction do not vary within the school day. For example, throughout a given school day, 
a high school Algebra 1 teacher is likely to cover the same mathematics content in a single 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



10 



lesson for each class period he teaches Algebra 1 that day. Moreover, the same Algebra 1 
teacher is likely to emphasize the same mathematics (e.g., procedural fluency), spend 
approximately the same amount of time on particular lesson components, and assign the 
same homework from the Algebra 1 textbook. Because the independence of cases is 
fundamental in hypothesis testing, we use teachers (not classes) as the unit of analysis. Our 
design is a two-level model, students nested within teachers, and we seek to identify 
student-level and teacher-level variables for inclusion in models of student achievement. 

Independent Variables 

Student-level. Recent studies of curricular effectiveness provide insight into what 
data are essential to collect. At the student-level, it seems requisite to collect data regarding 
prior achievement, gender, race/ethnicity, and other designations such as Individual 
Education Plan (IEP) and Limited English Proficiency (LEP). A complete list of the student 
variables to which we attended appears in Table 3. In studies of curricular effectiveness, it 
is imperative to include measures of student prior achievement in order to (a) establish the 
equivalence of treatment groups, or (b) control for non-equivalence of treatment groups. In 
a subsequent section, we provide detailed narrative on how we generated a common prior 
achievement score across districts in our sample, and data that show that mean student 
prior achievement scores across our two curriculum types are not significantly different. 

Table 3 

Student-level Control Variables 



Control Variable 




Data Source 


COSMIC Prior Mathematics Achievement 


Interval 


Transformation of scores on state-mandated tests 


Gender 


Dichotomous 


Student Records 






Student Records 


Individual Education Plan 


Dichotomous 


Student Records 


Limited English Proficiency 


Dichotomous 


Student Records 



Teacher-level. At the teacher-level, it is essential to collect information regarding 
characteristics such as experience, hours of professional development, knowledge and 
beliefs. Moreover, in response to the NRC (2004) stipulation that treatment integrity be 
documented in studies of curricular effectiveness, it is clearly necessary to collect data on 
teachers’ implementation of curricular materials, including how the curriculum was 
enacted and what opportunity-to-learn mathematics students were afforded. As reported 
in another paper (see McNaught et al. 2010), we examined the fidelity of implementation of 
curricular materials through two lenses: content fidelity and presentation fidelity. We used 
multiple data sources to gauge teachers’ implementation of curricular materials including 
Table of Contents Records, Textbook-Use Diaries, Initial Teacher Survey, and observations 
using a Classroom Visit Protocol. 

Collectively, nearly 30 variables were measured and these are listed in Table 4. 
Because of the large number of teacher variables, it was hypothesized that several 
attributes might be highly correlated, and hence essentially measuring the same construct. 
For example, consider the variables Seating and Collaboration, from our Classroom Visit 
Protocols. The extent to which students worked collaboratively during observed lessons 




















MODELING VARIATION OF STUDENT ACHIEVEMENT 



11 



(Collaboration) is likely related to how likely students were seated in groups (Seating); 
students tend not to work collaboratively if desks are arranged in rows. It follows that 
subsequent analyses were employed to reduce the number of teacher variables to a more 
manageable number. Without doing so, we run the risk of introducing bias and explaining 
far more variance that can be attributable to student- and teacher-level variables. 

Table 4 

Teacher-level Variables, by Data Source 



TABLE OF CONTENTS RECORDS 



OTL Index 


Opportunity to Learn Index represents the percentage of textbook lessons taught 


ETI Index 


Extent of Textbook Implementation index represents the extent to which teachers 
followed their textbook using weighted averages 


TCT Index 


Textbook Content Taught index represents the extent to which teachers, when teaching 
textbook content, followed their textbook, supplemented their textbook lessons, or used 
altogether alternative curricular materials 



CLASSROOM VISIT PROTOCOLS 



Pres Fidelity 



Content Fidelit 




mmssm 



Closure 



Seat 



Collaboration 




Global rating of presentation fidelity of textbook in observed lessons 



Global rating of content fidelity of textbook in observed lessons 



Likelihood that teacher utilized graphing calculators in instruction 



Likelihood that most students utilized graphing calculators during instruction 



Classroom Learning Environment: Reasoning about Mathematics 



Classroom Learning Environment: Students’ Thinking in Instruction 



Classroom Learning Environment: Sense-Making about Mathematics 



Relative frequency that teacher brought closure to the observed lessons. 



Extent to which most students were engaged fon-task] during observed lesson 



Relative frequency that students were seated in groups during observed lessons. 



Relative frequency that students worked collaboratively during observed lessons. 



Percent of class period devoted to lesson development 



Percent of class period devoted to non-instructional time 



Percent of class period devoted to practice and apply fhomework 



INITIAL TEACHER SURVEY 



Teach_Exp 



Math_Exp 




Text 



Preparation 



Number of years teaching 



Number of years teaching mathematics 



Teacher beliefs about reform-oriented practices 



Teacher beliefs about didactic approaches 



Teacher beliefs about students’ self-efficac 



Number of hours of professional development in the last 12 month 



Number of hours of professional development in the last 3 years 



Familiarity with Principles and Standards for School Mathematics fNCTM 2000’ 



Agreement with Principles and Standards for School Mathematics fNCTM 2000' 



Implementation of Principles and Standards for School Mathematics fNCTM 2000 



Number of years teaching from the district-adopted textbook 



Preparedness to teach the district-adopted textbook 



Rating of satisfaction with the district-adopted textbook 




TEXTBOOK-USE DIARIES 



Number of days spent on target content 






























MODELING VARIATION OF STUDENT ACHIEVEMENT 



12 



It is worth further noting that we use FRL as a proxy for SES despite its limitations 
(see Harwell & LeBeau, 2010). For at least two reasons, we aggregated the percentage of 
FRL students for each teacher rather than use FRL as a student-level variable. First, FRL as 
a student-level variable often yields extraordinary — arguably unwieldy — slopes that seem 
implausible. Second, District W was unwilling to provide FRL status for individual students 
(but was willing to provide it at the class-level), and this decision introduced 
methodological challenges. Accordingly, FRL is treated as a teacher-level covariate. 

Dependent Variables 

Project-developed tests. Following the recommendations of the NRC (2004), we 
developed assessment instruments using items written around topics common to both 
curriculum programs with the deliberate goal of not being biased towards either of the two 
curriculum programs. For the three years of the study, we developed five project tests. 

Each test was developed following a cycle of curriculum analyses and several rounds of 
external and internal reviews, pilots, and revisions. For each of the first two years of the 
study, two tests were developed: the first test, Test A, was comprised of items that focused 
on common topics (the fair test) across the two curriculum types; the second test, Test B, 
assessed students’ mathematical reasoning and problem solving. The items in the 
reasoning tests were based on topics that were appropriate to the grade level, according to 
the content in the textbooks, and as identified during our internal and external reviews. For 
a detailed discussion of the test development process, see Chavez et al. (2010). 

The majority of the items in the fair test (Test A) used in year 1 deal with linear 
relationships, a topic holding a central position in both Algebra 1 and integrated textbooks. 
The mathematical reasoning test (Test B) included problems on data analysis, algebra, and 
geometry. Although analyses are not reported here, in year 2, the fair test included some 
items on algebraic topics, although it was focused primarily on geometric topics and 
concepts common to the two curriculum types (e.g., coordinate geometry, perimeter and 
area, and trigonometry). The mathematical reasoning test for year 2 included geometric 
items and an algebraic item. In year 3, we developed only one test that focused on functions 
as the central mathematical idea. The items in these tests were constructed response with 
but one or two exceptions. 

The scoring rubrics were refined in an iterative manner, following a process parallel 
to the development of the tests. We examined the reliability of our scoring process and the 
results were excellent, with an inter-rater reliability above 94% for all five tests. 

After the tests were administered to 2,621 students in year 1, analyses of scores revealed 
that the rubrics we developed were applied in a highly reliable manner. 

Standardized test. The standardized measure of achievement we selected was the 
Iowa Test of Educational Development [ITED]: Mathematics: Concepts and Problem 
Solving. It received high ratings in the Buros Mental Measurements Yearbook (Schafer, 
2005) with regard to reliability and validity and it has been described as "among the best 
general-purpose assessments of high school students’ educational development available” 
(p. 10). Furthermore, it is nationally-normed, which makes it particularly useful in a 
comparative study. Naturally, there are important differences between the ITED and our 
project-developed tests. The ITED is a multiple-choice test. Large-scale assessments that 
rely on multiple-choice items permit only indirect inferences about students’ thinking 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



13 



(Silver, Alacaci, & Stylianou, 2000). The items in our project-developed tests are 
constructed-response items. Our scoring rubrics were designed to score separately each 
item’s answer and the work done to get that answer. In this way we have collected ample 
direct evidence regarding students’ performance on more complex problem-solving tasks. 

DATA ANALYSIS 

Student Data 

COSMIC Prior Achievement (CPA). Consistent with our theoretical framework 
(NRC, 2004) our quasi-experimental design necessitated that comparability be established 
by matching samples or making statistical adjustments using, among other factors, prior 
achievement measures. Because a pre-test administered to all students is rarely feasible in 
large-scale studies of curricular effectivness across multiple states such as ours, we opted 
for a reasonable alternative, namely the utilization of scores on state-mandated grade 8 
tests, typically administered during the 2004-05 school year. These high-stakes tests 
generally purport to measure student achievement in mathematics at a common point in 
time (grade 8), and so they provided useful information in characterizing student 
knoweldge prior to curricular treatments in the COSMIC Project. Nevertheless, state tests 
are usually not nationally-normed, and are scored using different scales. Consequently, it 
was necessary to put the scores on a common scale that would take into account 
differences across states, as average National Assessment of Educational Progress (NAEP) 
scores vary considerably across states. In particular, because participating school districts 
were located in five US states, it was important to acknowledge and subsequently adjust for 
differences in student achievement across each state. For example, given that grade 8 
mathematics students in State X scored above the US average on NAEP while students in 
State B scored below the US average, we mapped each student’s grade 8 state test score in 
mathematics onto the NAEP scale score for grade 8 mathematics. 

In grade 8, some students in District B were assessed using a nationally-normed 
mathematics achievement test. In these cases, we simply converted their scores to a 
national z-score, which we then mapped onto an NAEP scale score. Therefore, a grade 8 
student in State X scoring at the mean (z = 0) was assigned to the mean NAEP scale score 
for State X. A student scoring 1 standard deviation above the mean was assigned a NAEP 
scale score that corresponded to the mean NAEP scale score plus 1 standard deviation. 

For the vast majority of students in COSMIC, grade 8 scores on state-mandated tests 
were not nationally-normed. In these cases, we converted students’ scores in each state to 
z-scores before mapping these scores onto a NAEP scale score (see National Center for 
Educational Statistics, 2007). We called the resulting score COSMIC Prior Achievement Score 
(CPA Score). The diagram in Figure 2 illustrates the process. 




MODELING VARIATION OF STUDENT ACHIEVEMENT 



14 




Figure 2. Algorithm for generation of the COSMIC Prior Achievement (CPA) score. 

Consider a second illustrative example in which Student A has a scale score of 709 on the 
2005 grade 8 test mandated in State Y . Because the assessment for State Y is not a 
nationally normed test, we converted this student’s scale score to a state z-score using 
descriptive statistics for the 2005 State Y test: mean score of 682 and a standard deviation 
of 35. As depicted in Figure 3, this state z-score was then mapped onto the NAEP Scale 
Score: State Y had an average NAEP scale score of 263 and a standard deviation of 34, 
yielding a CPA score of 289. Thus, although Student A scores 0.77 standard deviations 
above the mean relative to grade 8 students in State Y, Student A scored approximately 



MODELING VARIATION OF STUDENT ACHIEVEMENT 



15 



0.28 standard deviations above the mean relative to grade 8 students in the US (see Figure 
4). 




Figure 3. Generating Student A’s COSMIC Prior Achievement (CPA) score. 



MODELING VARIATION OF STUDENT ACHIEVEMENT 



16 




Figure 4. Mapping Student A’s state score onto the grade 8 NAEP scale. 

Equivalence of treatment groups: Students. The transformation of student prior 
achievement scores on state-mandated tests to COSMIC Prior Achievement (CPA) scale 
scores yielded a relatively normal distribution across the year 1 sample (see figure 5). A 
preliminary analysis revealed there was no significant difference in mean CPA scores 
across curriculum types. Stated differently, while there was substantial variation in prior 
achievement within the year 1 sample, there was a comparable distribution in student 
achievement across curriculum types. 




CPA 

Figure 5. Distribution of COSMIC Prior Achievement scores, year 1. 



